Centre for Multi-Dimensional Data Visualisation (MuViSU) muvisu@sun.ac.za
SASA 2024
Introduction
The biplotEZ package aims to provide users with EZier software to construct biplots.
What is a biplot?
Visualisation of multi-dimensional data in 2 or 3 dimensions.
A brief history of biplots and biplotEZ.
History (1)
1971
Gabriel, K.R., The biplot graphic display of matrices with application to principal component analysis. Biometrika, 58(3), pp.453-467.
1976
Prof Niël J le Roux presents a seminar on biplots.
History (2)
1996
John Gower publish Biplots with David Hand.
Prof le Roux introduces a Masters module on Biplots (Multidimensional scaling).
Rika Cilliers obtains her Masters on biplots for socio-economic progress under Prof le Roux.
1997
SASA conference paper: S-PLUS FUNCTIONS FOR INTERACTIVE LINEAR AND NON-LINEAR BIPLOTS by SP van Blerk, NJ le Roux & S Gardner.
History (3)
2001
Sugnet Gardner (Lubbe) obtains her PhD on biplots under Prof le Roux.
Louise Wood obtains her Masters on biplots for socio-economic development under Prof le Roux.
2003: Adele Bothma obtains her Masters on biplots for school results under Prof le Roux.
2007: Idele Walters obtains her Masters on biplots for exploring the gender gap under Prof le Roux.
2008: Ryan Wedlake obtains his Masters on robust biplots under Prof le Roux.
History (4)
2009
BiplotGUI for Interactive Biplots, Anthony le Grange.
2010: André Mostert obtains his Masters on biplots in industry under Prof le Roux.
2011
John Gower, Sugnet Lubbe and Niël le Roux publish Understanding Biplots.
R package UBbipl developed with the book, but never published.
History (5)
2013: Hilmarie Brand obtains her Masters on PCA and CVA biplots under Prof le Roux.
2014: Opeoluwe Oyedele obtains her PhD on Partial Least Squares biplots under Sugnet Lubbe.
2015: Ruan Rossouw obtains his PhD on using biplots for multivariate process monitoring under Prof le Roux.
2016: Ben Gurr obtains his Masters on biplots for crime data under Prof le Roux.
2019
Johané Nienkemper-Swanepoel obtains her PhD on MCA biplots under Prof le Roux and Sugnet Lubbe.
Carel van der Merwe obtains his PhD using biplots. Carel supervises 4 Master’s projects on biplots.
Justin Perrang, Francesca van Niekerk, David Rodwell, Delia Sandilands
History (6)
2020
Raeesa Ganey obtains her PhD on Principal Surface Biplots under Sugnet Lubbe.
André Mostert obtains his PhD on multidimensional scaling for identification of contributions to out of control multivariate processes under Sugnet Lubbe.
Adriaan Rowen obtains his Masters using biplots to understand black-box machine learning models.
2022
Zoë-Mae Adams obtains her Masters on biplots in sentiment classification under Johané Nienkemper-Swanepoel.
History (7)
2023
bipl5 for Exploding Biplots, Ruan Buys.
2024
Ruan Buys obtains his Masters on Exploding biplots under Carel van der Merwe.
Adriaan Rowen to submit his PhD using biplots to understand black-box machine learning models.
Peter Manefeldt to submit his Masters using multidimensional scaling for interpretability of random forest models.
The Team
What are biplots?
The biplot is a powerful and very useful data visualisation tool.
Biplots make information in a table of data become transparent, revealing the main structures in the data in a methodical way, for example patterns of correlations between variables or similarities between the observations.
A biplot is a generalisation of a two-dimensional scatter diagram of data that exists in a higher dimensional space, where information on both samples and variables can be displayed graphically.
There are different types of biplots that are based on various multivariate data analysis techniques.
The simplest type of biplot is based on Principal Component Analysis.
Geometrically the rows of \({\bf{X}}\) are given as coordinates of \(n\) samples in the \(p\)-dimensional space \(\mathbb{R}^p\).
The aim is to seek an \(r\)-dimensional plane that contains the points whose coordinates are given by the rows of \({\bf{\hat{X}}}_{[r]}\) which minimises a least squares criterion given by, \[\begin{equation}
|| {\bf{X}} - {\bf{\hat{X}}}_{[r]}||^2 = tr\{({\bf{X}} - {\bf{\hat{X}}}_{[r]})({\bf{X}} - {\bf{\hat{X}}}_{[r]})'\}.
\end{equation}\]
The best approximation that minimises the least squares criterion is the \(r\)-dimensional Eckart-Young approximation given by \({\bf{\hat{X}}}_{[r]} = {\bf{U}} {\bf{D}}_{[r]} {\bf{V}}'\)
Representing samples
A standard result when \(r=2\) from is that the row vectors of \({\bf{\hat{X}}}_{[2]}\) are the orthogonal projections of the corresponding row vectors of \({\bf{X}}\) onto the column space of \({\bf{V}}_2\). The projections are therefore,
\[\begin{equation}
{\bf{X}} {\bf{V}}_2.
\end{equation}\] These projections are also known as the first two principal components.
Representing variables
The columns of \({\bf{X}}\) are approximated by the first two rows of \({\bf{V}}\), which now represent the axes for each variable.
Calibrated biplot axes
We have constructed a biplot, but the variables represented by the vectors (arrows) have no calibration.
That meaning, there are no markers on the vectors representing the variables analogous to ordinary scatterplots.
To construct a biplot axis with relevant markers for a variable, a \((p-1)\)-dimensional hyperplane \(\mathscr{N}\) perpendicular to the Cartesian axis is required.
First plane
From the data, \(p = 3\) therefore, a two-dimensional hyperplane is constructed perpendicular to \(X_1\) through a specific value of \(X_1\), say \(\mu\).
The intersection of \(\mathscr{L}\) and \(\mathscr{N}\) is an \((r-1)\)-dimensional intersection space, which in this case will be indicated by a line. All the points on this intersection line in \(\mathscr{L}\) will predict the value for \(\mu\) for the \(X_1\)-axis.
Second plane
The plane \(\mathscr{N}\) is shifted orthogonally through another value on \(X_1\) and all the points on the intersection line of \(\mathscr{L}\) and \(\mathscr{N}\) will predict that value that the plane goes through.
Intersection lines
As the plane \(\mathscr{N}\) is shifted along the \(X_1\)-axis, a series of parallel intersection spaces is obtained.
Any line passing through the origin will pass through these intersection spaces and can be used as an axis fitted with markers according to the value associated with the particular intersection space.
To facilitate orthogonal projection onto the axis, similar to an ordinary scatterplot, the line orthogonal to these intersection spaces is chosen.
Aims to expose the association between two categorical variables.
Categorical variables measure characteristics of individuals (samples) in the form of finite discrete response levels (category levels).
Summarised in a two-way contingency table.
Focus placed on nominal categorical variables - category levels with no specific rank / order.
Numerous variants of CA are available for the application to diverse problems, the interested reader is referred to references in the vignettes.
With biplotEZ, focus is placed on three EZ-to-use variants (more information to follow).
Correspondence Analysis
Data matrix in CA() is different from PCA() and CVA().
\(\mathbf{X}:r\times c\), represents the cross-tabulations of two categorical variables.
The elements of the data matrix represent the frequency of the co-occurrence of two particular levels of the two variables.
Consider the HairEyeColor data set in R, which summarises the hair and eye colour of male and female statistics students. For the purpose of this example only the male students will be considered:
cross_tab <- HairEyeColor[,,2]cross_tab
# Eye
# Hair Brown Blue Hazel Green
# Black 36 9 5 2
# Brown 66 34 29 14
# Red 16 7 7 7
# Blond 4 64 5 8
Current functions for CA() in biplotEZ
biplot()|>
CA()|>
interpolate()|>fit.measures()|>
samples()|>newsamples()|>
legend.type()|>
plot()
Take note of the warning message:
biplot(HairEyeColor[,,2], center =TRUE) |>CA()
# Warning in CA.biplot(biplot(HairEyeColor[, , 2], center = TRUE)): Centering was
# not performed. Set biplot(center = FALSE) when performing CA().
# Object of class CA, based on 4 row levels and 4 column levels.
CA calculations
It is typical to express the frequencies in terms of proportions / probabilities.
Consider the correspondence matrix \(\mathbf{P}\):
P_mat <- cross_tab/sum(cross_tab)P_mat
# Eye
# Hair Brown Blue Hazel Green
# Black 0.115015974 0.028753994 0.015974441 0.006389776
# Brown 0.210862620 0.108626198 0.092651757 0.044728435
# Red 0.051118211 0.022364217 0.022364217 0.022364217
# Blond 0.012779553 0.204472843 0.015974441 0.025559105
CA calculations
Row masses (diagonal matrix) - \(\mathbf{D_r}\):
ca.out <-biplot(HairEyeColor[,,2], center =FALSE) |>CA()ca.out$Dr
The inner product is invariant when \(\mathbf{A}\) and \(\mathbf{B}\) are scaled inversely by \(\lambda\), where \(\mathbf{A}\) represents the row coordinates and \(\mathbf{B}\) represents the column coordinates.
\[
\mathbf{AB} = (\lambda\mathbf{A})(\mathbf{B}/\lambda)
\] An arbitrary value of \(\lambda\) can be selected or an optimal value could be used to ensure that the average squared distance of the points in \(\lambda\mathbf{A}\) and \(\mathbf{B}/\lambda\) is equal.
ca.out <-biplot(HairEyeColor[,,2], center =FALSE) |>CA(variant ="Symmetric") |>fit.measures()
Quality:
ca.out$quality
# [1] 0.9833094
Adequacy:
ca.out$adequacy
# Brown Blue Hazel Green
# Dim 1 0.3824 0.5745 0.0425 0.0006
# Dim 2 0.5892 0.6277 0.3904 0.3926
# Dim 3 0.6102 0.6358 0.8530 0.9010
# Dim 4 1.0000 1.0000 1.0000 1.0000
Fit measures
ca.out <-biplot(HairEyeColor[,,2], center =FALSE) |>CA(variant ="Symmetric") |>fit.measures()
Row predictivities:
ca.out$row.predictivities
# Black Brown Red Blond
# Dim 1 0.6860 0.8630 0.4219 0.9974
# Dim 2 0.9932 0.9441 0.8508 0.9999
# Dim 3 1.0000 1.0000 1.0000 1.0000
# Dim 4 1.0000 1.0000 1.0000 1.0000
Column predictivities:
ca.out$col.predictivities
# Brown Blue Hazel Green
# Dim 1 0.9439 0.9898 0.4789 0.0113
# Dim 2 0.9990 0.9997 0.9020 0.8177
# Dim 3 1.0000 1.0000 1.0000 1.0000
# Dim 4 1.0000 1.0000 1.0000 1.0000
Canonical Variate Analysis (CVA) biplot
Aim: Dimension reduction technique that maximises variation between classes while minimising within class variation.
This is achieved by the following tasks:
Decomposing Variance
Find a linear mapping to canonical space.
Find a low dimensional approximation
Variance Decomposition
The classical variance decomposition \[\mathbf{T}=\mathbf{B}+\mathbf{W},\]
has as an analogy in this setting \[
\mathbf{X'X} = \mathbf{\bar{\mathbf{X}}'C \bar{\mathbf{X}}} + \mathbf{X' [I - G(G'G)^{-1}C(G'G)^{-1}G'] X}.
\]
The choice of \(\mathbf{C}\) determines the variant of CVA:
with \(\mathbf{M'BM}= \mathbf{\Lambda}\) and \(\mathbf{M'WM}= \mathbf{I}\).
Since the matrix \(\mathbf{W}^{-\frac{1}{2}} \mathbf{B} \mathbf{W}^{-\frac{1}{2}}\) is symmetric and positive semi-definite the eigenvalues in the matrix \(\mathbf{\Lambda}\) are positive and ordered. The rank of \(\mathbf{B} = min(p, G-1)\) so that only the first \(rank(\mathbf{B})\) eigenvalues are non-zero. We form the canonical variates with the transformation
\[\mathbf{\bar{Z}}=\mathbf{\bar{Y}}\mathbf{J}_2=\mathbf{\bar{X}}\mathbf{M}\mathbf{J}_2 \tag{7}\] where \(\mathbf{J'}_2=[\mathbf{I}_2 \quad \mathbf{0}]\). We add the individual sample points with the same transformation \[\mathbf{Z}=\mathbf{X}\mathbf{M}\mathbf{J}_2. \tag{8}\]
A new sample point, \(\mathbf{x}^*\), can be added by interpolation \[\mathbf{z}^*=\mathbf{x}^*\mathbf{M}\mathbf{J}_2.\tag{9}\]
CVA function
CVA()
Argument
Description
bp
Object of class biplot.
classes
Vector of class membership. User specified, otherwise defaults to vector specified in biplot.
dim.biplot
Dimension of the biplot. Only values 1, 2 and 3 are accepted, with default 2.
e.vects
Which eigenvectors (principal components) to extract, with default 1:dim.biplot.
weightedCVA
“weighted” or “unweightedCent” or “unweightedI”: Controls which type of CVA to perform, with default "weighted"
show.class.means
TRUE or FALSE: Controls whether class means are plotted, with default TRUE.
low.dim
"sample.opt" or "Bhattacharyya.dist": Controls method of constructing additional dimension(s) if dim.biplot is greater than the number of classes, with default "sample.opt".
Class means
The means() function allows the user to make adjustments to the points representing the class means.
Plotting only a selection of the class means
Argument
Description
which
a vector containing the groups or classes for which the means should be displayed, with default bp$g.
The following arguments control the aesthetic options for the plotted class mean points:
Argument
Description
col
the colour(s) for the means, with default as the colour of the samples.
pch
the plotting character(s) for the means, with default 15.
cex
the character expansion(s) for the means, with default 1.
opacity
transparency of means.
shade.darker
a logical value indicating whether the colour of the mean points should be made a shade darker than the default or specified colour, with default TRUE.
Class means
Colours and characters for the points
biplot(state.x77) |>CVA(state.region) |>means(cex =c(1,2,3,4), col ="red", pch =c(9,13,4,16)) |>plot()
Class means
Labels
The following arguments control the aesthetic options for the labels accompanying the plotted class mean points:
Argument
Description
label
a logical value indicating whether the means should be labelled, with default TRUE.
label.col
a vector of the same length as which with label colours for the means, with default as the colour of the means.
label.cex
a vector of the same length as which with label text expansions for the means, with default 0.75.
label.side
the side at which the label of the plotted mean point appears, with default bottom.
label.offset
the offset of the label from the plotted mean point.
# Computing 1.95 -ellipse for Northeast
# Computing 2.15 -ellipse for South
# Computing 2.45 -ellipse for North Central
# Computing 3.03 -ellipse for West
Measures of fit
bp <-biplot(state.x77) |>CVA(classes = state.region) |>fit.measures()
Contains the following information on how well the biplot represents the information of the original and canonical space:
quality: Quality of fit for canonical and original variables
adequacy: Adequacy of original variables
axis.predictivity: Axis predictivity
within.class.axis.predictivity: Class predictivity
The summary() function prints to screen the fit.measures stored in the object of class biplot.
bp |>summary()
# Object of class biplot, based on 50 samples and 8 variables.
# 8 numeric variables.
# 4 classes: Northeast South North Central West
#
# Quality of fit of canonical variables in 2 dimension(s) = 91.9%
# Quality of fit of original variables in 2 dimension(s) = 93.4%
# Adequacy of variables in 2 dimension(s):
# Population Income Illiteracy Life Exp Murder HS Grad
# 0.453533269 0.105327455 0.107221535 0.002201286 0.208653101 0.687840023
# Frost Area
# 0.452308013 0.118544323
# Axis predictivity in 2 dimension(s):
# Population Income Illiteracy Life Exp Murder HS Grad Frost
# 0.9873763 0.9848608 0.8757913 0.9050208 0.9955088 0.9970346 0.9558192
# Area
# 0.9344651
# Class predictivity in 2 dimension(s):
# Northeast South North Central West
# 0.8031465 0.9985089 0.6449906 0.9988469
# Within class axis predictivity in 2 dimension(s):
# Population Income Illiteracy Life Exp Murder HS Grad Frost
# 0.02246821 0.10349948 0.27870637 0.21460313 0.29836047 0.87510975 0.22320989
# Area
# 0.13603927
# Within class sample predictivity in 2 dimension(s):
# Alabama Alaska Arizona Arkansas California
# 0.769417280 0.174566384 0.328610375 0.148035077 0.103141908
# Colorado Connecticut Delaware Florida Georgia
# 0.357627854 0.079176621 0.438089663 0.327270922 0.558038750
# Hawaii Idaho Illinois Indiana Iowa
# 0.029173037 0.167543892 0.076948041 0.473148418 0.592667777
# Kansas Kentucky Louisiana Maine Maryland
# 0.774719240 0.439306768 0.190654770 0.086183357 0.284829878
# Massachusetts Michigan Minnesota Mississippi Missouri
# 0.428103056 0.188094295 0.644844800 0.163103449 0.719255739
# Montana Nebraska Nevada New Hampshire New Jersey
# 0.239142302 0.671350698 0.015766988 0.386053551 0.207503850
# New Mexico New York North Carolina North Dakota Ohio
# 0.012872885 0.008101305 0.872322617 0.457852394 0.092634247
# Oklahoma Oregon Pennsylvania Rhode Island South Carolina
# 0.561156131 0.158926944 0.261838286 0.482912999 0.229047767
# South Dakota Tennessee Texas Utah Vermont
# 0.095865021 0.237667483 0.121494852 0.349495632 0.256983459
# Virginia Washington West Virginia Wisconsin Wyoming
# 0.453608981 0.044780371 0.346223950 0.544998639 0.174849092
Rotation
The rotate() function rotates the samples and axes in the biplot by rotate.degrees degrees.
The reflect() function reflects the samples and axes in the biplot along an axis, x(horizontal reflection), y (vertical reflection) or xy (diagonal reflection).
The argument zoom=FALSE by default. If zoom=TRUE a new graphical device is launched. The user is prompted to click on the desired upper left hand and lower right hand corners of the zoomed in plot.
\[
\mathbf{\Lambda} = diag(\lambda, 0, ..., 0)
\] Total squared reconstruction error for means: \(TSREM = tr\{ (\bar{\mathbf{X}}-\hat{\bar{\mathbf{X}}})(\bar{\mathbf{X}}-\hat{\bar{\mathbf{X}}})'\} = 0\)
Total squared reconstruction error for samples: \(TSRES = tr\{ ({\mathbf{X}}-\hat{{\mathbf{X}}})({\mathbf{X}}-\hat{{\mathbf{X}}})'\} >0\)
Minimise \(TSRES\) (Default option)
Alternative option: Maximise Bhattacharyya distance. For more details see
le Roux, N. and Gardner-Lubbe, S., 2024. A two-group canonical variate analysis biplot for an optimal display of both means and cases. Advances in Data Analysis and Classification, pp.1-28.
CVA biplots for two classes
Solve \(\mathbf{BM=WM\Lambda}\) where \(\mathbf{M} = \begin{bmatrix} \mathbf{m}_1 & \mathbf{M}_2\\ \end{bmatrix}\)